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We propose several statistics to test the Markov hypothesis for 
/3-mixing stationary processes sampled at discrete time intervals. Our 
tests are based on the Chapman-Kolmogorov equation. We establish 
the asymptotic null distributions of the proposed test statistics, show- 
ing that Wilks's phenomenon holds. We compute the power of the test 
and provide simulations to investigate the finite sample performance 
of the test statistics when the null model is a diffusion process, with 
alternatives consisting of models with a stochastic mean reversion 
level, stochastic volatility and jumps. 

1. Introduction. Among stochastic processes, those that satisfy the 
Markov property represent an important special case. The Markov prop- 
erty restricts the effective size of the filtration that governs the dynamics 
of the process. In a nutshell, only the current value of X is relevant to 
determine its future evolution. This restriction simplifies model-building, 
forecasting and time series inference. Can it be tested on the basis of dis- 
crete observations? It is not practical to approach the testing problem in the 
form of a restriction on the filtration, the size of any alternative filtration 
being essentially unrestricted. Furthermore, the continuous-time filtration is 
not observable on the basis of discrete observations, especially if we do not 
have high-frequency data, and asymptotically the sampling interval remains 
fixed. 

Instead, we propose to test the Markov property at the level of 
the discrete- frequency transition densities of the process. Given a 
time-homogeneous stochastic process X = {Xt}t>o on K m , with the stan- 
dard probability space (£l;T;P) and filtration Ft C F, we consider families 
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of conditional probability functions P(-\x, A) of Xt+A given Xt = x: for each 
Borel measurable function ip, E[i/j(X t +A)\J : 't\ = J ip(y)P(dy\Xt, A). 

If X is time-homogeneous Markovian, then its transition densities satisfy 
the Chapman-Kolmogorov equation 



for all A > and r > and x in the support S of X. Suppose that we 
collect n observations on X on [0, T] sampled every A units of time. We 
will assume that A is fixed; asymptotics are therefore with T — > oo. High- 
frequency asymptotics, by contrast, assume that A — > 0, and T can be fixed 
or T diverges. This asymptotic setup could have been considered, but it 
is not necessary here as we are able to test the hypothesis on the basis 
of discrete data at a fixed interval with no requirement for high-frequency 
data; high-frequency asymptotics would, of course, also generate different 
asymptotic properties for the tests we propose. 

If we set r = A in (1), then we can estimate the transition densities at 
the desired frequencies on the basis of these discrete observations. On the 
left-hand side of the equation, the transition density at interval 2A can 
be estimated simply by retaining every other observation in the same data 
sample. To avoid unnecessary restrictions on the data-generating process, 
we will employ nonparametric estimators of the transition densities. Given 
these, equation (1) then becomes a testable implication of the Markov prop- 
erty for X. 

Conversely, Kolmogorov's construction (see, e.g., [28], Chapter III, Theo- 
rem 1.5) allows one to parameterize Markov processes using transition func- 
tions. Namely, given a transition function P and a probability measure tt 
on M. m serving as the initial distribution, there exists a unique probabil- 
ity measure such that the coordinate process X is Markovian with respect 
to a(X u ,u < t), has transition function P and Xq has tt as its distribution. 
When 7r is the invariant probability measure of P, the process is a stationary 
Markov process. Therefore, given an initial distribution, a Markov process 
X is determined by its transition densities. 

Transition densities play a crucial role in many contexts. In mathemati- 
cal finance, arbitrage considerations in finance make many pricing problems 
linear; as a result, they depend upon the computation of conditional ex- 
pectations for which knowledge of the transition function is essential. Also, 
inference strategies relying on maximum-likelihood or Bayesian methods re- 
quire the transition density of the process. Specification testing procedures 
for stochastic processes also make use of the transition densities (see, e.g., 
[1, 3, 7, 8, 18] and [24]). All these models, estimation methods and tests 
assume that the process is Markovian. 
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Stochastic volatility models are a very broad class of non-Markovian mod- 
els, due to the latency of the volatility state variable. They have been pop- 
ular in financial asset pricing and modeling (see, e.g., [17]). Parameters in 
stochastic volatility models are much harder to estimate and the associated 
pricing formulas are also different from those based on Markovian diffusion 
models and depend on the assumptions made on the correlation structure 
between the innovations to prices and volatility (as in, e.g., [23]). Other ex- 
amples include models for the term structure of interest rates, which may 
be Markovian or not (see, e.g., [22]), and, in fact, one popular approach 
in mathematical finance consists of restricting term structure models to 
be Markovian (see, e.g., [6]). In other words, many financial econometrics 
models are based on the Markovian assumption and this fundamental as- 
sumption needs to be tested before they can be applied. In all these cases, 
testing whether the underlying process is Markovian is essential in helping 
to decide which family of models to use and whether a diffusion model is 
adequate. 

We will propose test statistics for this purpose. Asymptotic null distribu- 
tions of test statistics are established and we show that Wilks's phenomenon 
holds for several of those test statistics. The power functions of the tests are 
also computed for contiguous alternatives. We find that the proposed tests 
can detect alternatives with an optimal rate in the context of nonparametric 
testing procedures. 

The remainder of the paper is organized as follows. In Section 2, we briefly 
describe the nonparametric estimation of the transition functions of the pro- 
cess. In Section 3, we propose several test statistics for checking the Markov 
hypothesis. In Section 4, we establish their asymptotic null distributions and 
compute their power. Simulation results are reported in Section 5. Technical 
conditions and proofs of the mathematical results are given in Section 6. 

2. Nonparametric estimation of the transition density and distribution 
functions. To estimate nonparametrically the transition density of observed 
process X, we use the locally linear method suggested by [14]. The process 
X is sampled at regular time points {iA,i = 1, . . . ,n + 2}. We make the 
dependence on the transition function and related quantities on A implicit 
by redefining 

Xi = X iA , i = l,...,n + 2, 

which is assumed to be a stationary and /3-mixing process. 

For ease of exposition, we describe the estimation of the transition density 
and distribution when m = 1, that is, X is a process on the line. We also 
define Yi = Yn± = X^ i+1 ^ and Zi = Zi& = X( i+2 )A - Let b\ and b 2 denote two 
bandwidths and K and W two kernel functions. Observe that as b% — >■ 



(2) 



E[K b2 (Z i -z)\Y i = y}np(z\y,A), 
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where K b2 (z) = K(z/b2)/b2 and p(z\y, A) is the transition density of -X7j +1 )A 
given XjA- The left-hand side of (2) is the regression function of the random 
variable K b2 (Zi — z) given Yj = y. Hence, locally linear fit can be used to 
estimate this regression function. For each given x, one minimizes 

n 

(3) -*)-<*- m - y)} 2 W bl {Yi - y) 

i=l 

with respect to the the local parameters a and (3, where W bl {z) = W(z/b\)/b\. 
The resulting estimate of the conditional density is simply a. The estimator 
can be explicitly expressed as 

n 

(4) p(z\y, A) = n- 1 £ W n (Yi - y, y; h)K b2 (Z t - z), 

i=l 

where W n is the effective kernel induced by the local linear fit. Explicitly, it 
is given by 

ttt ( j \ w , x s n ,2(y) - K lzs n,i(y) 

W n (z,y;b 1 ) = W bl (z)- 



Sn,o(y) s nM ~ 4,1^)' 

where 

n /Tr \ j 



1 / V- \ 3 



1=1 

Note that the effective kernel W n depends on the sampling data points 
and the location y. This is the key to the design adaptation and location 
adaptation property of the locally linear fit. 

From (4), a possible estimate of the transition distribution P(z\y, A) = 
P(Zi <z\Yi = y, A) is given by 

P(z\y,A) = ^ p( t \y,A)dt = ^W n (Y i -y,y-b 1 )K(^-^j, 

where K{u) = f u °° K{t) dt. Let b 2 -> 0, then 

1 n 

(5) P(z\y, A) = - V W n (Yi - y, y; b x )l{Zi < z), 

n. ^— • 



n ■ 

i=l 



where we drop the term in which Zi = z would contribute the value -^(0). 
This does not affect the asymptotic property of P. Actually, (5) is really the 
locally linear estimator of the regression function 

P(z\y,A) = E[I(Z i <z)\Y i = y}. 



TESTING THE MARKOV HYPOTHESIS 



5 



3. Nonparametric tests for the Markov hypothesis in discretely sampled 
continuous-time models. The tests we propose are based on the fact that, 
for X to be Markovian, its transition function must satisfy the Chapman- 
Kolmogorov equation in the form for densities equivalent to (1), 

(6) p(z\x,2A) = r(z\x,2A), 
where 

(7) r(z\x, 2A) = p(z\y,A)p(y\x,A)dy 

Jyes 

for all (x,z) G S 2 . 

Under time- homogeneity of the process X, the Markov hypothesis can 
then be tested in the form Hq against H±, where 

/on / H :p(z\x, 2A) - r(z\x, 2A) =0 for all (x, z) G S 2 , 
U \H 1 :plz\x,2A)-r(z\x,2A)^0 for some (x, z) G S 2 . 

This test corresponds to a nonparametric null hypothesis versus a nonpara- 
metric alternative hypothesis. 

Both p(y\x,A) and p(z\x,2A) can be estimated from data sampled at 
interval A, thanks to time homogeneity. In fact, the successive pairs of ob- 
served data {(Xi, ^)}™i L 1 form a sample from the distribution with con- 
ditional density p(y\x,A) from which the estimator p(y\x,A) can be con- 
structed, and then f(z\x,2A) as indicated in equation (7) can be computed. 
Meanwhile, the successive pairs (X\, Z\), (X2, Z2), ■ . ■ , form a sample from 
the distribution with conditional density p(z\x, 2A) which can be used to 
form the direct estimator by drawing a parallel to (4) 

1 n 

p{z\x, 2A) = - V W n (Xi - x, x; h x )K h2 (Z t - z), 
n L — ' 
i=l 

where hi and /12 are two bandwidths, localizing, respectively, the x- and 
z-domain. 

In other words, the test compares a direct estimator of the 2A-interval 
conditional density, p(z\x, 2A), to an indirect estimator of the 2A-interval 
conditional density, f(z\x, 2A), obtained by (7). If the process is actually 
Markovian, then the two estimates should be close (for some distance mea- 
sure) in a sense made precise by the use of the statistical distributions of 
these estimators. 

If, instead of 2A transitions, we test the replicability of jA transitions, 
where j is an integer greater than or equal to 2, there is no need to explore 
all the possible combinations of these jA transitions in terms of shorter 
ones (1, j — 1), (1, j — 2), . . .: verifying equation (6), for one combination is 
sufficient as can be seen by a recursion argument. In the event of a rejection 
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of Hq in (8), there is no need to consider transitions of order j. In general, a 
vector of "transition equalities" can be tested in a single pass in a method of 
moments framework with as many moment conditions as transition intervals. 

We propose two classes of tests for the hypothesis problem (8) based on 
nonparametric estimation of the transition densities and distributions. To 
be more specific, since 

(9) r(z\x,2A) = E[p(z\Y i ,A)\X i = x], 

the function r(z\x, 2A) can also be estimated by regressing nonparametri- 
cally p(z\Yi,A) on Xj. This avoids integration in (7) and makes implemen- 
tation and theoretical studies easier. Employing the local linear smoother 
for (9), we obtain the following estimator: 

n 

r(z\x, 2A) = n" 1 ^ W n (Xi - x, x, h 3 )p(z\Y h A), 
i=i 

where /13 is a bandwidth in this smoothing problem. Under Hq in (8), the 
logarithm of the likelihood function is estimated as 

£(H ) = ^2\ogf(Z i \X i ,2A), 

i=i 

after ignoring the initial stationary density tt{X\). This likelihood can be 
compared with 

n 

£(H 1 ) = J2^ogp(Z i \X i ,2A), 

i=l 

which leads to the generalized likelihood ratio (GLR) test statistic (see [16]) 

n 

^log{f(Z i |^,2A)/p(Z i |X,2A)}. 

i=l 

Since the nonparametric regression functions cannot be estimated well when 
(Xi,Zi) is in the boundary region, the above GLR test statistic is reduced 
to 

T = ^log{f(Z i \X i ,2A)/p(Z i \X i ,2A)}w* {X { , , 
i=i 

where w* is a weight function selected to reduce the influences of the unre- 
liable estimates in the sparse region. Admittedly, £{H\) is not the estimated 
log- likelihood under H\ in (8), but is used to create a discrepancy measure. 
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To see this, note that under Hq, p and r are approximately the same. By 
Taylor's expansion, we have 

^ p(Z^,2A)-r(^|X 8 ,2A) 

1 ^ f 2A) - r(Z,|X,, 2A) | 2 

+ 2^\ p(Z 4 |^,2A) J 10 

To avoid unnecessary technicalities, we ignore the first term and consider 
the second term 

which is the X 2_ type of test statistics. A natural alternative statistic to T* 
is 

n 

(11) T 1 =J2{p(Z i \X i ,2A)-f(Z i \X i ,2A)} 2 w(X i ,Z i ). 
i=i 

The resulting test statistics T* and T\ are discrepancy measures between p 
and f in the L2-distance. Discrepancy-measure based test statistics receive 
attention and achieve success in the literature. Other discrepancy norms 
such as the Loo-distance can also be investigated in the current setting. See 
the seminal work by [4, 5] and [21]. They are not qualitatively different as 
shown in the classical goodness of fit tests. 

Since the testing problem (8) is equivalent to the following testing prob- 
lem: 

H : P(z\x, 2A) - R(z\x, 2A) = for all (a?, z) £ S 2 , 
^ :P(z\x, 2A) -R(z\x,2A) for some (x,z)£S 2 , 



(12) 

with, in light of (9), 



R{z\x, 2A)= f r(t\x,2A)dt = E{P(z\Y,A)\X =x}, 

J — oo 

then transition distribution-based tests can be formulated too. Let P(z\x, 2A) 
be the direct estimator for the 2A-transition distribution 



1 

(13) P(z\x,2A) = -Yw n {X l - x,x; h 1 )I(Z i < z). 

8=1 

Regressing the transition distribution P(z\Xj,A) on Xj—i yields R(z\x, 2A): 

n 

(14) R(z\x, 2A) = n- 1 W^X, - x, x; h 3 )P{z\Y t , A), 



i=l 
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where P(z\y, A) = n~ l J27=i W n (Xi ~ V, y, h)I(Zi < z). Similarly to (11), for 
the testing problem (12), the transition distribution-based test will be 

n 

(15) T 2 = Y,{HZi\Xi,2A) - RiZilX^A)} 2 ^), 

i=l 

where the weight function uj(-) is chosen to depend on only x-variable, be- 
cause P(z\x, 2 A) is a nonparametric estimator of the conditional distribution 
function, and we need only to weight down the contribution from the sparse 
regions in the x-coordinate. 

Note that the test statistic T 2 involves only one-dimensional smoothing. 
Hence, it is expected to be more stable than T\, and the null distribution 
of T 2 can be better approximated by the asymptotic null distribution. This 
will be justified by the theorems in the next section. 

The choice between the transition density and distribution-based tests 
reflects different degrees of smoothness of alternatives that we wish to test. 
In a simpler problem of the traditional goodness-of-fit tests, this has been 
thoroughly studied in [10]. Essentially, the transition density-based tests are 
more powerful in detecting local deviations whereas the transition distribution- 
based tests are more powerful for detecting global deviations. 

4. Asymptotic properties. 

4.1. Assumptions. We assume the following conditions. These conditions 
are frequently imposed for nonparametric studies for dependent data. 

Assumption (Al). The observed time series {Xj}^ 2 is strictly sta- 
tionary with time-homogenous jA-transition density p(Xi + j\Xi,jA). 

Assumption (A2). The kernel functions W and K are symmetric and 
bounded densities with bounded supports, and satisfy the Lipschitz condi- 
tion. 

Assumption (A3) . The weight function w(x, z) has a continuous second- 
order derivative with a compact support fi*. 

Assumption (A4). The stationary process {Aj} is /3-mixing with the 
exponential decay rate /3(n) = 0(e~ Xn ) for some A > 0. 

Assumption (A5). The functions p(y\x; A) and p(z\x; 2A) have contin- 
uous second-order partial derivatives with respect to (x,y) and (x,z) on the 
set Jl*. The invariant density ir(x) of {Xi} has a continuous second-order 
derivative for x £ Q*, a project of the set Q* onto the x-axis. Moreover, 
7r(z) > 0, p(y\x, A) > and p(z\x, 2A) > for all (x, y) <E Q.* and (x, z) £ O*. 
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Assumption (A6). The joint density pu(x\,xi) of (Xi,Xi) for £ > 1 
is bounded by a constant independent of £. Put gu(xi,xg) = pu(x±,X{) — 
tt(xi)7t(x£). The function g\t satisfies the Lipschitz condition: for all (x',y') 
and (x,y) in 



Ifftffoy) -fl^O^f')! < CyJ (x - x 1 ) 2 + {y- y') 2 . 

Assumption (A7). The bandwidths h^s and hi are of the same order 
and satisfy nh\/\ogn — > oo and nh\ — > 0. 



Assumption (A8). The bandwidth h\ converges to zero in such a way 
that nh^ 2 — > and nh^ 2 — > oo 



4.2. Asymptotic null distributions. To introduce our asymptotic results, 
we need the following notation. For any integrable function f(x), let ||/|| 2 = 
J f 2 (x) dx and 

s(z\x,2A) = J p 2 (z\y,A)p(y\x,A)dy = E[p 2 (z\Y 1 ,A)\X 1 = x}. 

Note that the sampled observations {X n+ 2-i}™j 1 are a reverse Markov 
process under the null model. We also use p*(x\z, 2A) to denote the 2A- 
transition density of the reverse process, and let 

s*(x\z, 2A) = J p* 2 {y\z,A)p*(x\y,A)dy. 

Denote by 

fin = J w(x,z)p 2 (z\x,2A)dxdz, 

n 12 = J W (x,z)Az\x,2A)dxdz, 

Q\3 = J w(x, z)s(z\x, 2A)p(z\x, 2A) dx dz, 

I4 = /w(x,^(^,2AW,|x,2A)<ix<i,, 

^15= / w(x, z)s*(x\z, 2A)p*(x\z, 2A)[ir(z)/ir(x)] 2 dxdz, 



^2 = J w 2 (x,z)p 4: (z\x,2A)dxdz. 

For a kernel function K{-), let K*(-) = K* K(-) and i^(-) = h~ l K{-/h). 
Denote by V(x, z) the conditional variance function of P(z\Y, A), given X = 
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x. Then it is easy to see that 

SI13 — ili4 = J w(x,z)V(x,z)p(z\x,2A)dxdz 

= E{V(X, Z)w{X, Z)\X = x}. 

Throughout the paper, we use the notation T n ~ x\ n f° r a diverging sequence 
of constants a n to represent that 

(T n -a n )/V2^ A 7V(0,1). 

Theorem 1. Assume Conditions (Al)-(A7) hold. If {Xi} is Marko- 
vian, 

where 

m = nii\\wf\\K\\ 2 /(hih 2 ) - n 12 \\w\\ 2 h^ 
+ (o 13 - n u )\\w\\ 2 /h 3 + n 15 \\K\\ 2 /b 2 , 

and a 2 = 2Q, 2 \\W *W\\ 2 \\K * K\\ 2 /(hih 2 ). Furthermore, r\T\^x\ n , where 
a n = riH\ and ri = 2fjL\/o-\. 

The test statistic T-j", as far as its null distribution is concerned, can 
be regarded as a special case of Ti, with the weight function u>(x,z) = 
p~ 2 (z|x, 2A)w*(x, z). Correspondingly, let fi^ ■ denote with -u;(x,z) re- 
placed by p _2 (z|x,2A) x ii?*(:c, 2) and £l 2 defined similarly. Then, we have 

Corollary 1. Under the conditions in Theorem 1 with w replaced by 
w* , r\T* ~ x 2 ,* > where 

„.._ nfclWlW (1 + o(1)) , 



1 W|| 2 ||ir* A"|| 2 

L -(1 + 0(1)). 



n n^\\W*W\\ 2 \\K*K\\ 2 h 1 h 2 

The r* is asymptotically a constant depending on only the kernels and 
the weight function. The degree of freedom a* is independent of nuisance 
parameters. This reflects that the Wilks phenomenon continues to hold in 
the current situation. 
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Theorem 2. Under Conditions (Al)-(A6) and (A8), if {AJ is Marko- 
vian, 



(T 2 -/x 2 )/a 2 AaT(0,1), 



where 



= T^HI^II 2 fu(x){l + Gh^EiViX^ Z A )\X A = x]}dx, 
6fti J 

and a\ = || W * VF|| 2 ||a;|| 2 /(45/ii). Furthermore, r 2 T 2 ~Xb n > where b n = r 2/ u 2 
and r 2 = 2/i 2 /a|. 

Comparing Theorems 1 and 2, it is seen that asymptotic variance of T\ is 
an order of magnitude larger than that of T 2 . Therefore, the null distribution 
of T 2 can be more stably approximated than that of T\. On the other hand, 
the degrees of freedom in T\ are larger than in T2 , and the transition density- 
based tests are more omnibus, capable of testing a wider class of alternative 
hypothesis. 

4.3. Power under contiguous alternative models. To assess the power of 
the tests, we consider the following contiguous alternative sequence for T\: 

(16) H ln :p(z\x,2A) - r(z\x, 2A) = g n (x, z), 

where g n satisfies E[g 2 n (X, Z)] = 0{5 2 n ) and var[^(X, Z)\ < M(E[gl(X, Z)]) 2 
for a constant M > and a sequence 8 n going to zero as n — > 00. Then the 
power of the test statistic T\ can be approximated using the following the- 
orem. 

Theorem 3. Under Conditions (Al)-(A7), ifnhih 2 S 2 = 0(1), then un- 
der the alternative hypothesis H\ n , 

(Ti-/xi-di„)/(7i n AAA(0,1), 



where d\ n = nE{g I l (X, Z)w(X, Z)}(1 + o(l)) ; and o\ n = y o~ 2 + 4:0~( A with 
a\ A = nE[g 2 n (X, Z)w 2 (X, Z){p(Z\X, 2A) - p 2 (Z\X, 2A)} 2 ]. 

Using Theorem 1, one can construct an approximate level-a test based 
on T\. Let c a be the critical value such that 

P{(T X - fi^/ax > c a \H } < a. 

Then we have the following result, which demonstrates that the test statistic 
T\ can detect alternatives at rate 5 n = 0(n~ 2 / 5 ). 
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Theorem 4. Under Conditions (Al)-(A6), T\ can detect alternatives 
with rate 5 n = 0(n~ 2 / 5 ) when hi = c\n~ 1 ^ and h 2 = c 2 n~ 1 ^ for some con- 
stants c\ and c 2 . Specifically, if 5 n = dn~ 2 /^ for a constant d, then: 

(i) lim sup lim sup P{ (7\ — p,\)jo\ > c a \Hi n } < a; 

d—>0 n->oo 

(ii) lim inf lim inf P{(T\ — pi)/o~i > c a \H\ n \ = 1. 

s-oo »H>oo 

Similarly to (16), we consider the following alternative sequence to study 
of the power of the test statistic T 2 : 

H 2n ■ P(z\x, 2A) - R(z\x, 2A) = G n (x, z), 

where G n (x,z) satisfies E[G*(X,Z)] = 0(p 2 n ) and v&i(Gl(X,Z)) <Mx 
(E[Gn (X, Z)]) 2 for a constant M > and a sequence p n tending to zero. 
Then using the following theorem one can calculate the power of the test 
statistic T2. 



Theorem 5. Under Conditions (Al)-(A6) and (A8), ifnhihsp^ = 0(1), 
then under the alternative hypothesis H^n, 



{T 2 -pL 2 -d 2n )/a 2n AaT(0,1), 

where d 2n = nE[G 2 (X, Z)uj(X)] + 0(nh\p n + Pnh^ 1 ), a\ n = a 2 + 4cr| A and 

2 



cj 2A = nE 

- nE 



j G n (X, Z)u;(X)I(Z <z)P(dz\X, 2A) 
J G n (X, Z)u(X)P(z\X, 2A)P(dz\X, 2A) 



In a manner parallel to Theorem 4, the following theorem demonstrates 
the optimality of the test. 



Theorem 6. Under Conditions (Al)-(A6), T 2 can detect alternatives 
with rate p n = 0(n~ 4 / 9 ) when hi = c*n~ 2 / 9 for some constant c*. 

From Theorem 6, T 2 can detect alternatives at rate 0(n -4 / 9 ). Using an 
argument similar to [11], we can also establish the minimax rate, 0(n -4 / 9 ), 
of the test. Note that the rate is optimal according to [26, 27] and [29]. Com- 
pared with Theorem 4, it is seen that T 2 is more powerful than T\ for testing 
the Markov hypothesis. This is due to the fact that the alternative under 
consideration for T 2 is global, namely, the density under the alternative is 
basically globally shifted away from the null hypothesis. On the other hand, 
T\ and Tj* are more powerful than T 2 for detecting local features of the 
alternative hypothesis. We will now explore these features in simulations. 
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5. Simulations. An important application of our test methods is to verify 
the Markov property in the context where the null model is a diffusion 
process, since it is often assumed in modern financial theory and practice 
that the observation process comes from an underlying diffusion. Hence, we 
consider simulations for the diffusion models. 

To use the test statistics, one needs to find their null distributions. The- 
oretically the asymptotic null distributions may be used to determine the 
p- values of the test statistics. However, in practical applications the asymp- 
totic distributions do not necessarily give accurate approximations, since 
the local sample size nh\h<i may not be large enough. This phenomenon 
is shared by virtually all nonparametric kinds of tests where some form of 
functional estimation is used. 

We will mainly focus on the finite sample performance of the test statistic 
T*, since it possesses the Wilks property which facilitates bandwidth selec- 
tion and determination of the null distribution using a bootstrap method. 
Since the asymptotic null distribution of Tj* is independent of nuisance pa- 
rameters/functions under the null hypothesis, for a finite sample it does not 
sensitively depend on the nuisance parameters/functions. Therefore, the null 
distribution can be approximated by bootstraps, by fixing nuisance parame- 
ters/functions at their reasonable estimates, as in [12] in a different context. 

In general, different bootstrap approximations to the null distributions 
are needed for different null models, partially due to the large family of 
null models with the Markov property. We will illustrate this method for 
the Ornstein-Uhlenbeck model, which in financial mathematics is used for 
instance as the [30] model for interest rates. For other parametric models, 
our approach can similarly be applied. 

The Ornstein-Uhlenbeck model employed as the null hypothesis is 

(17) dX t = n{a- X t )dt + adW t) 

where Wt is a Brownian motion, and the parameters are set as k = 0.2, 
a = 0.085, a = 0.08, which are realistic for interest rates over long periods. 
We simulated the model 1000 times. In each simulation, we draw a sam- 
ple with sample size n = 2400 and weekly sampling interval A = 1 /52 using 
for this purpose a higher frequency Euler approximation, or an exact dis- 
cretization. The bandwidth selection for the test statistic T-j* is performed 
using the simple empirical rule proposed by [25]. Alternative methods in- 
clude the cross-validation approaches of [15] and [20], but their computation 
is intensive especially when repeated many times in Monte Carlo. 

Given a sample from the model, we fit the model using the least squares 
method and obtain the residuals of the fit, and then generate bootstrap 
samples using the residual-based bootstrap method. For each simulation, 
we obtained three bootstrap samples (this is merely for the reduction of 
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computation cost; using more samples will not fundamentally alter the re- 
sults) and computed the test statistic T* using the same bandwidths as the 
original sample in the simulation. Pooling together the bootstrap samples 
from each simulation, we obtained 3000 bootstrap statistics. Their sampling 
distributions, computed via the kernel density estimate, is taken as the dis- 
tribution of the bootstrap method. By using the kernel density estimation 
method, the distribution of the realized values of the test statistic T* in 
simulations is obtained as the true distribution (except for the Monte Carlo 
errors) . 

Figure 1 displays the estimated densities for T*. Not surprisingly, the 
bootstrapped distributions get much closer to the true ones as the sam- 
ple sizes increase. In our experience, the bootstrap approximations start to 
become adequate for sample sizes starting at about 2400. 

To investigate the power of the test statistics, we employ various sequences 
of alternatives indexed by a parameter 9 = 0, 0.2, 0.4, 0.6, 0.8, 1.0. One of 
the main ways for an otherwise Markovian model to become non-Markovian 
is to restrict too much its state space. For instance, consider a bivariate 
diffusion model. Taken jointly, the two components are Markovian, but taken 
in isolation a single component may not be: 

1. Alternative model with missing state variable in the drift: we first consider 
the situation where the null model (17) is missing a state variable, in 
this case X mean-revers to the stochastic level Oat + (1 — 0)a under the 

Estimated densities of the statistics Estimated densities of the statistics 
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Fig. 1. Estimated densities. Left panel: n — 1200; right panel: n = 2400. Solid — true, 
dotted — the bootstrap approximation. 
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alternative 

H w : dX t = K(0a t + (1 - 9)a - X t ) dt + adW t , 

where at is the random process 

dctt = Ki(a — at) dt + o\ dB t , 

with Bt a the Brownian motion independent of Wt, Ki = n/s, a = sa, 
and u\ = a/2, with s = 100 and 10. When 0/0, the alternatives are 
non-Markovian. The results in the first part of Table 1 show that the 
test statistic rejects the null hypothesis when the observations are drawn 
under H\q. 

2. Alternative model with missing state variable in volatility: next, we con- 
sider alternative models where volatility is stochastic, 

H 2e : dX t = K (a - X t ) dt + ((1 - 8)a + 0a t ) dW t , 

where at = y/Yt is a random process following the [9] model 

dY t = K 2 (b- Y t ) dt + a 2 Y t 1/2 dB 2t , 

where B 2 t is a standard Brownian motion independent of Wt, k 2 = k/s, 
b = sa and a 2 = a/2, with s = 1000, 100 and 10. When 0/0, the alter- 
natives are also non-Markovian. 



Table 1 
Power of the test against H\g 











Parameter 9 






s 


Level a 


0.0 


0.2 


0.4 


0.6 


0.8 


1.0 


100 


0.01 


0.011 


1 


1 


1 


1 


1 




0.05 


0.055 


1 


1 


1 


1 


1 


10 


0.01 


0.011 


0.010 


0.070 


0.228 


0.580 


0.846 




0.05 


0.055 


0.019 


0.123 


0.549 


0.901 


0.989 








Table 2 












Power 


of the test against H2e 














Parameter 






s 


Level a 


0.0 


0.2 


0.4 


0.6 


0.8 


1.0 


1000 


0.01 


0.013 


0.402 


0.660 


0.762 


0.813 


0.817 




0.05 


0.067 


0.557 


0.768 


0.845 


0.878 


0.905 


100 


0.01 


0.013 


0.028 


0.183 


0.372 


0.492 


0.573 




0.05 


0.067 


0.098 


0.340 


0.527 


0.627 


0.697 


10 


0.01 


0.013 


0.007 


0.020 


0.017 


0.032 


0.088 




0.05 


0.067 


0.037 


0.052 


0.070 


0.122 


0.218 



16 



Y. AIT-SAHALIA, J. FAN AND J. JIANG 



3. Alternative model with missing state variable in jumps: finally, we con- 
sider a model with compound Poisson jumps 

H w :dX t = K(a-X t )dt + a dW t + J t dN t (9) , 

where Nt(9) is a Poisson process with stochastic intensity 9 and jump 
size 1, while Jt is a the jump size. We will consider two types of jump 
sizes: 

(i) Jt is independent of Ft and follows N(0,a\) with o\ = a/2, which 
makes H^g Markovian; 

(ii) Jt follows the CIR model 

1 /2 

dJ t = K{a — Jt)dt + aiJ t ' dB 3t , 

where B 3 t is a standard Brownian motion independent of Wt, K = 
0.2, a = 0.085 and o\ = 0.08/2. Then J t is not independent of F t . 
This leads to alternatives H 3 g which are not Markovian for 9 7^ 0. 

The alternative models considered here are /3-mixing. For example, in the 
first alternative H\g, the joint process (Xt,ctt) is an affine process and it is 
/3-mixing. Hence, Xf is /3-mixing. A similar argument applies to two other 
alternatives. In fact, for the first alternative Hig, the time series (A^AjO^a) 
can be written as a bivariate autoregressive model. Hence, it is /3-mixing with 
the choice of parameters. Note that for all of the above alternatives, when 9 is 
small, the null and alternative models are nearly impossible to differentiate. 
In the limit where 9 = 0, the null and the alternative are identical. Therefore, 
it can be expected that, when 9 = 0, the power of test should be close to the 
significance level; and as 9 deviates more from 0, the power should increase. 
Also we can expect that our tests will be able to detect only the type (ii) 
jumps but not the type (i) jump, since for the type (i) jump the alternatives 
are Markovian. 

The simulated powers are reported in Tables 1-3. The null distribution 
of the normalized test statistics does not depend sensitively on choice of 
bandwidth, whereas the power depends on the choice of bandwidth and the 
alternative under consideration. As expected, our test is fairly powerful for 
detecting non-Markovian alternatives H^g (k = 1,2,3), at least in situations 
where the alternative is sufficiently far from the null. For H 3 g, the test has, 
as it should, no power to identify the type (i) alternatives but is powerful 
for discriminating against the type (ii) alternatives. This illustrates well the 
sensitivity and specificity of our tests. 
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Table 3 
Power of the test against Hgg 



Parameter 



Jump type 


Level a 


0.0 


0.2 


0.4 


0.6 


0.8 


1.0 


(0 


0.01 


0.010 


0.009 


0.023 


0.003 


0.016 


0.009 




0.05 


0.059 


0.048 


0.054 


0.054 


0.058 


0.056 


(ii) 


0.01 


0.010 


0.514 


0.774 


0.888 


0.940 


0.951 




0.05 


0.059 


0.533 


0.796 


0.894 


0.946 


0.961 



6. Technical proofs. 



6.1. Technical lemmas. We now introduce some technical lemmas, the 
proofs of which can be found in the supplemental material of this paper. To 
save space, some notation in the lemmas will appear later in the course of 
proofs of the main theorems. 

Lemma 1. Suppose that W is symmetric and continuous with a bounded 
support. Ifh^-0 and nh^t co, then 



W n (z,x;h) 



1 



z hTr'(x) 
^{W)K(x)~h^{x)^(W) 



+ O p (p n (h))-W h (z), 



+ O p ( Pn {h))\W h {z) 



uniformly for x £ ft*, where O p (p n (h)) does not depend on z, where p^iW) 
JW(u)du. 

Lemma 2. Under Conditions (Al)-(A6): 
(i) for A; = 0,1, 



sup 



'Y,bl k {Y i -y) k W bl {Y l -y)e l {z) 

8=1 

r A; = 0,1, 
1 - 



:0{V(logn)/(n&i&2)}; 



sup 

(x,z)en 



P Wlog(n)/(nh 3 )}; 



(iii) sup (a . !z)eQ » |i Y!j=x Q*( x , z j) e j+l( z )\ = O p {^log(n)/(nb 2 )}; 



(iv) sup (a . z)en . \±Y%=iW hl (X j -x)E*(z)\ = P W(logn)/(nh 1 h 2 )}. 



18 Y. AIT-SAHALIA, J. FAN AND J. JIANG 

Lemma 3. Under Conditions (Al)-(A6), we have 

I n 

in{x,y) = - ^ r nl (x,Yi)ei(z) = O v {sJn~ x b x togn), 
i=i 

uniformly for (x,z) G 0*, where r n \ is defined right after (32). 
Lemma 4. Suppose Conditions (Al)-(A5) hold. Then 



r ln (x,z)=n- l Y,<{x,Y i )E i {z) = 0{^[(b\ + hi)\ogn]/{nb 2 )}, 
i=l 

uniformly for (x,y) E Q* , where r*(-, •) is defined in (34). 

Lemma 5. Under Conditions (Al)-(A6): 
(i) Ei^KnWJ) ~ ~ m + m\ = o p (h^); 

(n) (n-i)tu\m-m)=o P (K i ). 

Lemma 6. Assume Conditions (Al)— (A5) hold. Then we have: 

(i) under Condition (A6), 

\n(n- 1)^(0) = n 11 \\W\\ 2 \\K\\ 2 /(h 1 h 2 ) - n 12 \\W\\ 2 /h x 

+ ni3\\w\\ 2 /h 3 -Q u \\w\\ 2 /h 3 

+ Q 15 \\K\\ 2 /b 2 + 0(n~ 2 y, 

(ii) under Condition (A7), 

in(n- 1)0(0) 

= 7^W W W 2 I ^(x){l + Qh 1 h 3 1 E[V(X A ,Z A )\X A = x]}dx + 0(l). 

Lemma 7. Assume that Conditions (Al)-(A5) hold. Then we have: 

(i) under Condition (A6), 

E ^(^,j) Aaa(o,i), 

l<i<j<n 

w/iere CT 2 n = 2tt 2 \\W * W\\ 2 \\K * K\\ 2 /(n 2 /ii/i 2 ); 

(ii) under Condition (A7), 

E ^(»'.J)AJV(0 1 1), 
l<i<j'<n 

w/iere cr 2 n = ||W * VF|| 2 ||u;|| 2 /(45ra 2 /ii). 
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6.2. Preliminaries. Since the test statistics T\ and T* compare the differ- 
ence between p(z\x, 2A) and r(z\x, 2A), we derive an asymptotic expression 
for this difference under Hq before giving the proofs of theorems. In addition, 
in order to streamline our arguments, we will introduce some technical lem- 
mas and put them behind the proofs of theorems. The arguments employed 
here use techniques from the U -statistic and nonparametric smoothing. 

First let us introduce some notation. Let p n {h) = h 2 + W log n/(nh), 
IM)(W) = fW(x)dx and ^(W) = f x 2 W(x)dx. Denote by m(y,z) = 
E{K b2 (Zj - z)\Yj = y}, m*(x,z) = E{K h2 (Zj - z)\Xj = x], mi(y,z) = 
dm(y, z)/dy and m*(x, z) = dm*(x, z)/dx. 

Using an elementary property of the local linear smoother (see, e.g., [13]), 
we obtain that 



(18) p{z\x, 2A) - p(z\x, 2A) = A* n (x, z) + B*(x, z 
where e*(z) = K h2 (Zj - z) - m*(Xj,z), 



) + C*(x,z) 





x)} 




where m%(x, z) = — \x=Sj ; an d Xj lies between Xj and x. By [14], it is 
easy to show that 




(20) B* n (x,z) = O p (h\) and C*(x, z) = O p (h 2 2 ), 
uniformly for (x,z) S 0*. By the definition of r, we have 

(21) r(z\x, 2A) - r(z\x, 2A) = L nl (x, z) + L* nl (x, z), 



where 




1 



n 



n 



x, x; hs){p(z\Yj , A) — r(z\x, 2A)}. 
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Subtracting (21) from (18), we obtain that, under Hq :p(z\x, 2A) = r(z\x, 2A), 
p(z\x, 2A) - f(z\x, 2A) = A*(x, z) + B*(x, z) + C*(x, z) 

(22) 

- L n i(x,z) - L n2 (x,z) - L n3 (x,z), 

where 

n 

L n2 (x,z) = n~ x w n{Xj ~ x,x; fa 3 ){p(z|lj, A) - r(z\X h 2&)}, 

3=1 
n 

L n3 (x, z) = n- 1 WniXj - x, x; ^ 3 ){r(^|X j ,2A) - r(z\x, 2A)}. 

3=1 

By the continuity of d 2 r(z\x, 2A)/dx 2 , it is easy to show that 
(23) L n3 (x,z) =O p (h\) uniformly for (x,z) 6 O*. 

Therefore, by (20), (22) and (23), 
p(z\x, 2A) — r(z\x, 2A) 

= [A* n (x,z) - L n2 (x,z)} - L nl (x,z) + O p (^2 h ^j • 

Let ej(z) =p(z\Yj, A) — r(z|Xj, 2A). Then it can be rewritten that 

1 n 

(25) L n2 (x,z) = - V^W n (Xj -x,x;/i 3 )ej(z). 

3 = 1 

Note that r(z\Xj, 2A) = #{p(,z|Y^ A)|Xy}. It follows that £[ej(;z)|Xj] =0 
and Var[ej(z)] = O(l) uniformly for z and j = 1, . . . ,n. Applying Lemma 1 
with z = Xj — x and h = /i 3 , we obtain that 

W n (Xj -x,x;h 3 ) 



(24) 



+ O p (p„(/i 3 ))^-^iy h 3(X i - x„ 



uniformly for x G O*, where O p (p n (h 3 )) does not depend on j. Therefore, 
£n20c,2) =i n 2i(a;,^) - £7122 , z) + L n23 (x,z) +L n24 (a;,z), 
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where 

1 n 

n 

L n23 (x,z) = O p (p n (h 3 ))n~ 1 '^2W h3 (Xj -x)ej(z), 

i=i 

A ,• — X , 



in.24(2;,z) = O p (p n (h 3 ))n 1 V" — ^- Wh 3 (Xj - x)ej(z) 

tl 3 

3=1 



By Lemma 2(ii), we have Lji2i(:c, 2) = O p {y/ (logn)/(nh 3 )} and 



£7122(2:, 2) = O p {h 3 yJ (logra)/(ra/i 3 )} = O p {\/(^3 logn)/n}, 
uniformly for G £1* . Then 



L n23 (a;,z) = O p {/o n (/i 3 )v / (log n)/(n/i 3 )} = o p { v / (/i3 logn)/n} 



and L n 24(^,-z) = o p { ^ (/13 log nj/n} , uniformly for (x,z) G 0*. Then 



1 1 



£n2(x,z) = 7^^^y -^^3( X 3 -^) e i(^) + O p {y/{h 3 logn)/n}, 

uniformly for (x,z) G SI*. Note that from (19) and (25) 

1 n 

(27) A* n (x,z)-L n2 (x,z) = - J2\W n (Xj - x,x\h{)e*Az) 

i=i 

(28) — W n (Yj - x,x;h 3 )e j+ i(z)]+r n (x,z), 
where 

r n (x,z) = --W n (Xi -x,x;/i 3 )ei(z) + -W n (Y n - x, x; h 3 )e n+ i(z), 
n n 



which is of order O p (l/(nh 3 )) = o p { \J (h 3 log n)/n}, uniformly for (x,z) G 
£1*. Let £i{z) = Ki) 2 {Zi — z) — m(Yi,z). Then, similarly to (18), we have 

(29) p(z\y, A) - p(z\y, A) = A n (y, z) + B n {y, z) + C n {y, z), 

where A n (y,z) = n- l J2i =1 W n (Yi - y ,y;b\)ei(z) , B n (y,z) = O p (b\) and 
C n (y,z) = Op(bf), uniformly for (y,z) G ft*. It follows from the definition 
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of L n i that 

n 

L nl (x,z) = n~ ls ^W n {X j -x,x;h 3 )A n (Yj,z) 

(30) 



n 

— 1 ' 

n 

i=i 



1 W n (Xj ~ x, x- h 3 ) [B n (Yj ,z) + C n {Yj ,z)]. 



Using Lemma 1, we get 

Ai(y,z) =A nl (y,z) - A n2 (y,z) + A n3 (y,z) + A n4 (y,z), 

where 

1 n 

A n i(y,z) = /iQ7r ^ ^" 1 ^ W bl (Yi -y)Ei(z), 
A n2 (y, z) = ^f-n^ ^W bl (Y - y)e i (z), 

n 

A n3 (y, z) = O p ( / o n (6i))n- 1 ^ W bl (Y - y)si(z), 

i=l 

n Y — 

A n4 (y, z) = Opipnih))^ 1 £ -L-y-W bl (Yi - y)e i {z). 

i=i 1 

Using Lemma 2(i), we obtain that 

logn 



Ansiy, z ) = Op(p n {bi))O p 
and 



nb\b 2 



AnA(y, z) = Op(p n (bi))O p ^^/^ 
uniformly for (y,z) G $7*. Then 



logn 



A n (y,z) = A nl (y,z) - A n2 (y,z) + O p (p n (bi))yJ (logn)/(n&i& 2 ), 

uniformly for (y,z) £ This, combined with (30) and Condition (A6), 
yields that 

(31) L nl (x,z) = L n i 1 (x,z)-L nl2 (x,z) + L nl3 (x,z) + O p {(logn)/(nb 1 / )}, 
where L nll (x,z) = n" 1 YJj=i W n (Xj - x,x;h 3 )A nl (Yj, z), 

n 

L n i 2 (x,z) = n~ lJ ^2 W n (Xj - x,x;h 3 )A n2 (Yj,z), 
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n 

L nl3 (x,z) = n~ l ^W n (X j - x,x;h 3 )[B n (Yj,z) + C n (Yj,z)}. 
j'=i 



Note that, by Lemma 2(i), A n i(y, z) = O p {y/ (logn) / (nbib 2 )} , uniformly for 
(y,z) E Q*. Using Lemma 1, we obtain that 

(32) L„n(x, z) = M nll (x, z) + M nl2 (x, z) + O p {(logn)/(n^ /2 )}, 
where 

- - n . n 
n n 



M nl2 (x, z) = ^\- 2 jr £ ^-V* - x)W h - 

xn- 1 (Y j )e i (z). 



Let 



M* n (x, y) = n- 1 W h3 (Xj ~ x)W bl (y - Y j )%~ 1 (Y j ), 

3=1 

g n (x,y) = E[M* n (x,y)] and r nl (x, y) = M* n (x, y) - g n (x, y). Then 

1 1 n 1 1 n 

M nll (x,z) = 2 . -y^g n (x,Yi)ei(z) + 2 . . -} r nl (x,Yi)£i(z). 
W(x) n ~t l-W(x) n ^ 

By Lemma 3, 



1 1 



(33) M nn (x,z) = -^——-^2g n (x,Yi)ei(z) + O p {^J (b x log n)/n}, 
[i 7r{x) n 

uniformly for (x,z) E J)*. Similarly to Lemma 2(iii), the first term on the 
right-hand side of (33) is O p {yJ (logn)/(nb 2 )}, uniformly for (x,z) E f2*. 
Hence, 



sup \M nn (x,z)\ = O p {y/(logn)/(nb 2 )}. 

Similarly, we have 

sup \M nl2 (x,z)\ = O p {h 3 y/ (logn) / (nb 2 )} = O p {a/(6i log n)/n}. 

By the symmetry of the kernel function and Taylor's expansion, it can be 
shown that 

g n (x,y) = E[K- 1 (Y l )W bl (y-Y l )W h . i (X l -x)] 
= /xgp(y|x, A)7r(x)/7r(y) + 0(b\ + /if) 
= $ P *(x\y,A) + 0(bl + h 2 3 ), 
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uniformly for (x,y) G $7*, where p*(x\y, A) is the one-A transition density of 
the reverse series {X n+ 2-i}™=i , that is, the conditional density of X\ given 
Y\ = y. Note that g n is a deterministic function. It follows that 

(34) g n (x,Y i )= f j 2 oP *(x\Y i ,A)+r* n (x,Y i ), 

where r* (x, Y{) is <r(5^)-measurable and is of order 0(b 2 + h 2 ) for (x, Yi) G Q,* . 
This combined with (33) leads to 

1 n 0(1) n 

L n u(x,z) = - y^q*(x,Yi)ei(z) H 'V\rl(x,Y^£ i (z) 

i=l i=l 

(35) 

+ O p ({logn/(n&? /2 )} + {b^log^/n} 1 ' 2 ), 
where q*(x,y) = p(y\x, A)/ir(y). The first term in (35) is obviously 



1 n ( 1 \ 



By Lemma 4, the second term in (35) is P (\J (bf + /i|) log(n)/(n&2)) 5 uni- 
formly for (x,z) G 0*. Then uniformly for (x,z) G £1*, 

1 n 

L nll (x,z) = -^ ( ?*(x,Z J )e m (z) + O p ({logn/(n6f 2 )} + {6 1 (logn)/n} 1 / 2 ). 

i=l 

In the same argument, L n i2(x,z) is dominated by L n n(x,z) and is of order 

b l L nll {x,z)=O p ({\ogn/{nb\ /2 )} + {b^ogn/n} 1 / 2 ), 
which combined with (31) leads to 

1 n 

L nl (x,z) = - q*(x,Zi)e i+ i(z) + L nl3 (x,z) 

(36) 

+ O p ({log n/(nbf 2 )} + {b x log n/n} 1 / 2 ) , 

uniformly for (x,z) G £1* . This together with (24) and (27) yields the follow- 
ing asymptotic expression: 

(37) p(z\x,2A) - f(z\x, 2A) = T nl (x,z) + T n2 (x,z) +T n3 (x,z) +T n4 (x,z), 
where 



1 n 

T n i(x,z) = - ^2[W n (Xj - x,x;hi)e*(z) 

W n (Yj - x,x;h s )e j+ i(z) - q*{x, Zj)e j+1 (z)], 



n 
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n 

T n2 (x,z) = n~ 

T n3 (x,z) = B*(x,z) + C*(x,z) + L n3 (x,z), 

T nA (x, z) = O p ({logn/(nhf 2 )} + {h logn/n} 1 / 2 + {log n / (nb 3 / 2 )}), 
uniformly for (x,z) £ il*. 

6.3. Proofs of theorems. We now give the proofs of our main results. 

Proof of Theorem 1. (i) Approximate T\ by a U -statistic. Let Wi = 
w(Xi,Zi). By (37) and the definition of T\, we have 

n 

T\ = Wi[T n i(Xi, Zi) + T n 2(Xi,Zi) +T n 3(Xi,Zi) + T n 4(Xj, Zi)] 2 
i=i 

n 4 n 

= zZYl WiT nk (X h Zi) + 2j2 mT nl (Xi , Zi)T n2 (X, , Zi ) 
i=l k=i i=i 

n n 

+ 2j2^iTni(Xi, Zi)T n3 (Xi, Zi) + 2j2w i Tn2(X i ,Y i )T n3 {X i ,Y i ) 

i=l i=l 
n 

+ 2 [T n i (Xj ,Zi) + T n 2 (Xi ,Zi) + T n3 (Xj , Zj )] T n 4 (Xj , ) 

i=l 

= T U + T 12 + T 13 + T U + T 15 . 



By Lemmas 1 and 2, T n i(x, z) = O p {y/ (logn) / (nhifi2)} ■ Note that T n 2(£, z) = 
O p (b\), T n3 (x,z) = O p (h\), uniformly for (x,z) G ft*. It is straightforward to 
verify that T14 = O p (nh\) = o(l/hi), T15 = o p (l / 'y 'h\hi) . Using the same ar- 
gument as for (B.2) in [3], we obtain Tyi = o p (l / \fhih~2) an d T\ 3 = o p (l/y / /ii/i2) 
Therefore, 



n 4 



i=l fc=l 



Note that 



yjWfT^2(Xj, Z») = O p (nh\) = o p (l/h\ 
i=i 

n 

^2wiT 2 3 (Xi, Zi) = Op(l/fn) 



i=i 
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and 

n 

^ j w i T 2 i {X i ,Z i ) = o p (l/h 1 ). 
i=l 

It follows that 

n 

Ti =^2w i T% l (Xi, Zj) + Qpjhi 1 ) 

i=i 

= f 1 + o p (/ i ^ 1 ). 

It can be rewritten that 

n 

f 1 = Y,^[B* nl (X u Z i )-B* n2 (X u Z i )-B n3 (X u Z l )} 2 , 
i=i 

where B* nl (x, z) = ± £™ =1 W n (Xj - x, x; h 1 )s*{z) 1 



B n2 



1 n 

(x,z) = ~y~]W n (Yj - x,x;h 3 )e j+ i(z) 



n 

3=1 



and 



1 ™ 

B n3 (x,z) = - ^2q*(x,Zj)e j+ i(2 



n 

3=1 



1 1 



n 



■J2p{Zj\x,A)Tr(x)Tr 1 (Z j )e j+ i(z). 



n n(x) ^— /J 

3=1 

Applying Lemmas 1 and 2 and using Condition (A5), we obtain that 

n 

Tl =J2 W ii B nl( X i, Z i) ~ Bn2(X l ,Z i ) - B n3 (Xi, ZJ} 2 + O p (h^) 
i=l 

where B nl (x, z) = ^ £" =1 W^Xj - x)e*(z) and 

1 1 n 

B n2 (x,z) = r^/^2 w h s ( Y j -x)e j+1 (z). 

V 1 3=1 

Hence, 

n 

T i = Yl MBni(Xi,Zi) - B n2 (X u Zi) - B n3 (X u Zi)} 2 + o p (h^) 



i=i 

Let j) = WfoiXj — Xi)e*(Zi) — Wh 3 (Yj — Xi)ej + i(Zi) - q(X h Zrfej. 
and 



if)(i,j,k)=n 2 WiiT 2 (Xi)£(i,j)£(i,k), 
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where q(x,z) = p(z\x, A)tt(x)/tv(z) =p*(x\z, A). Then 

n 

Ti= ^2 ^(^j'k) + °p( h i 1 )- 

i,j,k=l 

(ii) Derive the asymptotics using the asymptotic theory for the U-statistic. 
Let 

B n = ^2 {ip(i,3,k) +tp(i,k,j) +ip(j,i,k) 

i<j<k 

+ i>(j,k,i) +ip(k,i,j) +ip(k,j,i)}, 
B12 = ^2bP(i,3,j) +i>(j,i,j) +ip(j,3,i)] 



and 



B13 = ^V(v 



i=l 

Then 

(38) T 1 = B u +B 12 + B 13 + o p (h^ 1 ). 

Let i/)*(i,j,k) = t/)(ij,k)+^(i,kj)+^(j,i,k)+^(j,k,i)+i;(k,i,j) + 'il](k,j,i) 
Then ip*(i,j,k) is symmetrical about (i,j,k), and hence Bn =Yli<j<k' l ^*(^3i 
Using Hoeffding's decomposition, we obtain that 

(39) B n = ^ *(i,i J A;) + (n-2) ]T 

i<j<k l<i<j'<n 

where 

*(i,j,fc)=V'*(i ) j,fe)-^*(»,i)-V'*(i ) fc)-^0' ) fc) J 
i>*(i,j) = I if>*(h3,k)dF(x k ,y k ,z k ) and F is the distribution of Y k ,Z k ). 
Applying the lemma with 5 = 1/3 in [19], we can show that -£?{X^<?<fc ^ 
A;)} 2 = o(/i^ 2 ). Therefore, the first term on the right-hand side of (39) is 
Op(/ij~ 1 ), so that 

(40) B u = (n-2) r{h3) + o P {K 1 ). 

l<i<j<n 

By the Markovian property of {^Q}, E[tp*(i,j)] = 0. Hence, up to a ig- 
norable term of order o p (/ij~ 1 ), -Bn is a [/-statistic with mean zero. De- 
fine 4>(i,j) =ip(i,i,j) + ip(j,j,i) + ip(j,i,j) +^{i,j,j), 
ip(i) = f tp(i,j)dF(xj,yj,Zj) and ip(0) =E[ip(i)]. Then we have 

Bn= Yl $&3)- 

l<i<j'<n 
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Since ip(i,j) is a symmetrical kernel, using the Hoeffding decomposition, we 
obtain that 

Bi2= m,j)-m-4>ti)+$(o)] 

l<i<j<n 

( 41 ) " " „ 

+ (n - 1) Y$(t) - j>(0)] + \n{n - 1)^(0). 
i=i 

By Lemma 5, 

(42) Bl2 = \n{n- 1)^(0) + o p {h^ x ). 

Note that -B13 > 0. By straightforward calculation on the mean of -B13, it 
can be shown that 

(43) B Vi = O p (n/(n 2 h\hl)) = o p (h^). 
Therefore, a combination of (38) and (40)— (43) leads to 

(44) Tl = ^n(n- 1)4,(0) + (n -2) £ + o p (h^). 

l<i<j<n 

By Lemma 6(i), 

\n(n- 1)^(0) =/ii+o p (/ i ^ 1 ). 
Applying Lemma 7(i), we obtain that 

(n-^flulMA^l), 

i<j 

where <rf = 217 2 1| * VT|| 2 ||-K * K\\ 2 /(hih 2 ). Therefore, the result of this 
theorem holds. □ 

PROOF of Theorem 2. The proof is similar to that of Theorem 1. 
(i) Asymptotic expression for P(z\x, 2A) — R(z\x, 2A). By the definitions 
in (13) and (14), 

1 n 

P(z\x, 2A) - P(z\x, 2A) = -^ j W n (Xi-x,x- 1 h{) 

n i=i 

(45) 

x [I(Zi<z)-P(z\x,2A)}, 

(46) A(z|z, 2A) - R(z\x, 2A) = 5„i(z, z) + S n2 (z, 2), 

where S nl (x, z) = n~ x YIU W n (Xi - x, x; h 3 )[P(z\Y u A) - P{z\Y u A)] and 

n 

(47) S n2 (x, z) = n" 1 ^ W n (X, - x, x- h 3 )[P(z\Y u A) - R(z\x, 2 A)]. 

i=i 
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Let Ui(z, A) = I(Zi < z) - P(z\Yi, A). Then E[ Ul (z, A)] = 0. By (5), 

n 

P(z\y, A) - P(z\y, A) = n" 1 ^ - y, y; &i)[I(^ < z) - P(z\y, A)]. 

i=l 

This can be rewritten as 

(48) P(z\y, A) - P(z\y, A) = P nl (y, z) + P n2 (y, z), 
where 

n 

fni(y, *) = n' 1 ^2 w n{Yi - y, y; h)ui(z, A), 

i=l 
n 

Pn2(y, z) = n- 1 W n (Yi - y, y; h)[P(z\Y t ,A) - P{z\y, A)]. 

i=l 

By Lemma 1 and the symmetry of the kernel function W(-), and by using 
Taylor's expansion, it is easy to show that 

(49) P n2 (y, z) = (d 2 /dy 2 )P(z\y, A)b 2 + o p {b 2 ) = O p (bj), 
uniformly for (y,z) £ Q*. Hence, 

(50) P(z\y, A) - P(z\y, A) = P nl (y, z) + O p (b 2 ), 

uniformly for (y,z) € £1*. Then 

n 

(51) S nl (x, z) = n- 1 W n (Xi - x, x- h 3 )P nl (Y t ,z) + O p (b 2 ), 

i=l 

uniformly for (x,z) £ Q* . Using the same arguments as those for L n n(x,z) 
between (32) and (37), we obtain that 

1 - 

S n i(x,z) = ~y^q* (x,Yi)ui(z, A) 

i=l 

+ O p ({logn/(nbl /2 )} + {b^ogr^/n} 1 ' 2 ) 
^ i „ 

= -^2q*(x,Zi)u i+ i{z,A) 
n i=i 

+ O p (^+{h(lo g n)/n}^y 

Rewrite S n 2(x,z) as 

S n 2(x, z) = S n2 i(x, z) + S n2 2{x, z), 
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where 

n 

S n21 (x, z) = n- 1 W n {X t - x, x; h 3 )[P{z\Y u A) - R{z\X u 2A)], 

i=l 
n 

3 n 22{x, z) = n- 1 W n (X l - x, x; h 3 )[R(z\Xi,2A) - R(z\x, 2A)]. 

i=l 

By the continuity of d 2 R(z\x, 2A)/dx 2 and the same argument as that for 
(49), S n22 {x,z) = Op(hl), uniformly for (x,z) G O*. Let e*(z) = P(z\Y h A) - 
R(z\Xi, 2A). Then E[e* i {z)\X i ]=Q, and 

n 

S n2 (x,z) = n" 1 ^ W n (Xi - x,x;h 3 )e*(z) + O p (h 2 3 ) 

i=l 

(53) 



n 

n 

i=i 



1 Y, W n (Yi - x, x- h 3 )e* +1 (z) + O p {hl). 



By (45) and (46), under Hq, we have 

(54) P(z\x,2A) - R{z\x, 2A) = -S nl (x,z) - S n2 {x, z) + S n3 (x, z), 
where, with u*(z, 2A) = I(Zj < z) - P(z\Xj,2A), 



1 n 

S n3 (x,z) = -V^(Ii - x,x;hi)[I(Zi <z)- P(z\x,2A)] 

i=l 
1 n 

= -Y / W n (X i - x, x- hx)ut(z, 2A) 

i=l 
1 n 

+ -y / W n( x i-^x;h 1 )[P(z\X i ,2A)-P(z\x,2A)}. 

i=l 

Similarly to (49), the second term above is of order O p (h 2 ), 
1 n 

(55) S n3 (x, z) = -Y, W n (Xi - x, x- h!)u*(z, 2A) + O p (hj), 



n ■ 

i=\ 



uniformly for (x,z) 6 0*. A combination of (52)-(55) yields that 
(56) P(z\x, 2A) - R(z\x, 2 A) = T* x {x, z) + T* 2 (x, z) + T n * 3 (x, z), 
where 



1 n 

T n \(x, z) = -Y[W n (X 3 - x, x; h x )u){z, 2 A) 



n 
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- W n (Yj -x,x;h 3 )e* j+1 {z) - q*(x, Zj)u j+1 (z, A)], 

T* 2 {x,z) = O p {b 2 + h 2 + h 2 ), 

uniformly for (x,z) £ ft*, and 

T* 3 (x,z) = O p {{\ogn/{nb\ /2 )} + {^(logn)/^ 1 / 2 ), 

uniformly for (x,z) E fi*. 

(ii) Asymptotic normality 0/T2. Similar to (44), we have 

(57) T 3 = in(n - 1)0(0) + (n- 2) £ j) + o^/i" 1 ), 

l<j<j'<n 

where 0(0) and cp*(i,j) are defined the same as ip(0) and ip*(i,j), respec- 
tively, but with tp replaced by 

<p(i,j,k) = n 2 WiTT~ 2 (Xi)r](i,j)rj(i,k), 

where 

riihj) = W hl {Xj - Xi)^(Zi,2A) - W/JY- - A^)e* + i(^) 
- q(Xi,Zj)u j+1 (Zi, A). 
By Lemma 6 (ii) , we have 

(58) \n{n - 1)0(0) = /i 2 + o p (/i^ 1 ). 
By Lemma 7(ii), we have 

(59) (n-2)^0*(z,j)M AJV(0,1). 

A combination of (57)-(59) completes the proof of the theorem. □ 

Proof of Theorem 3. Under H ln , p(z\x, 2A) = r(z\x, 2 A) + g n (x, z). 
Similarly to (22), we have under H\ n 

p(z\x, 2A) - f(z\x, 2A) = Q n (x, z) + g n (x, z), 

where 

Q n (x,z) = A* n {x, z) + B* n (x, z) + C*(x, z) - L nl (x,z) -L n2 (x,z) - L n3 (x,z). 
Then 



, , 1=1 1=1 

+ 2 



i=l 
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Since b\ = 0( nf ^ h ), it can be shown that 

n 

(61) £&(X i ,Z i )w i = nE\&(X,Zr)w(X,Z)]+o p (l/y/kj^). 



i=i 



By (20) and (23), B*{x,z) = O p (h\), C*(x,z) = O p (h 2 2 ) and L n3 (x,z) = 
Op(h^), uniformly for (x,z) It follows from the Holder inequality that 

n 

2^2w i g n (X i , Zi)[B*(Xi, Zi) + C*(X h Z t ) - L n3 (Xi,Zi)} 

i=l 

(62) 

= O p {n5 n {h\ + hl + hl)). 
A combination of (60)~(62) yields that 

n 

Ti = Yl Q 2 n(^,Zi)wi + nE[g 2 n (X, Z)w(X, Z)\ 
i=i 



+ 2^9n(Xi, Zi)wi[A^(Xi, Zi) - L n2 {X u Zi) - L nl (X u Zi)] 



(63) 



i=l 



+ {o P (l/VhJi2) + O p (n8 n {h\ + h% + /if))} 



Tn + T 12 + Ti 3 + apil/Vfahz). 



Tn can be dealt with in the same way as in the proof of Theorem 1. It is 
asymptotically normal with mean fj,i and variance a\ given in Theorem 1. 
By the definition, T12 = d± n . We now study the third term T13. By (27) and 
(36), T13 admits the following decomposition: 

1 n 

-T13 = J29n(X i ,Z i )w i [A* n (X i ,Z i ) - L n2 (Xi, Z») - L nl {X h Z,)\ 

8=1 

n n 

= Y,9n{Xi,Zi)wi-Y,{W n {Xj - X^XvMVjiZi) 



n 

i=i j=i 



-W n (Yj - X i -X i ;h 3 )e j+1 (Z i ) 

+ Op(l/y/KJ^) + 0(n5 n {b\ + + OiSntq 1 ^ 1 ) 
-9n(X l , Z i )tu i 7r- 1 (X i ){W hl (Xj - Xi)^(Zi) 



. n 
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+ o p (l/Vh^h 2 ) + 0{n5 n {b\ + bl)) + 0{5 n h^h^) 



= ]T <p(i,j) + o p {l/^ff^h 2 ) + 0(n5 n {b\ + b 2 2 )) + OiSn/Quhi)). 

The first term above is a [/-statistic with the typical element ip(i,j). Let 
= t p{hj) + fiii^)- Then ip*(i,j) is a symmetric kernel and 

Ti 3 = Yl ^(i,i) + 0(^/(/ii/i 2 )) + o p (i/ v / M^). 

l<i<j<n 

Put <p(i) = f ip*(i,j)dFj and <p(i,j) = f*{i,j) - (p{i) - <p(j). Then by the 
Hoeffding decomposition, we have 

n 

It is easy to show that £[/ii/i 2 <^(i, j)] 2(1+<5) = 0(5n (1+<5) n- 2 ( 1+<5 )/ii/i 2 ). There- 
fore, applying the lemma with 5 = 1 of [19], we obtain that 

E { E V&i)} =o(l/(hih 2 )). 



" l<j<j<n 



Therefore, 



(64) Ti 3 = (n - 1) <p(i) + O p {l/y/hJT 2 ) + OiSJih^)). 

i=l 

By the definition of <pi, it can be written that 
2 

(p(i) = -g n (Xi, Zi)w(Xi, Z^tt" 1 (Xi) 
n 

x f {W hl (xj - XiXjiZi) - W h3 (y, - Xifa+^Zi) 

-q*{X i ,z j )e j+l {Z i )}dF j 

= <pi(i) + $ 2 (i)+(pz{i), 



where 



<h(i) = lg n (X i ,Z i )w(X i ,Z i )n- 1 (X i ) J W hl ( Xj - X^Z-) dF h 
<h(i) = -~9n{Xi, Zi)w(Xi, Z^iXi) j W h3 ( Vj - Xi)e j+1 (Zi) dFj 
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and <p 3 (i) = -%g n (Xi, Z^)w{X u Z i )TT~ 1 (X i ) J q*(Xi,z j )£ j+ i(Z i )dF j . Then by 
the Fubini theorem and by taking iterative expectation, E[<p(i)] = 0. Using 
the central limit theorem for the /3-mixing process, we get 



i=l 



where a\ A = \nE[(n — l) 2 ip 2 (i)]. By directly calculating the integration, it 
can be shown that 

<P!(i) = -g n (Xi,Zi)w(Xi,Zi)\p(Zi\Xi,2A) - p 2 (Z i \X i ,2A)](l + o(l)), 
n 

<p 2 (i) = o(g n {Xi,Zi)/n) and <p 3 (i) = o{g n {X il Z { ) /n). Therefore, 

a 2 1A = nE[g 2 n (X 1 ,Z 1 )w 2 (X 1 ,Z 1 ){p(Z l \X i ,2A)-p 2 (Z i \X i ,2A)} 2 ] 
+ o(l/{h 1 h 2 )). 

By straightforward calculation, it can be shown that the covariance between 
Tn and T13 can be ignored. It follows that the result of the theorem holds. 
□ 

Proof of Theorem 4. (i) For any given small 77 > 0, when d is small 
enough, \d\ n /a\ n \ < 7] and o\ n = a\{l + o(l)). Under Hq, with the selected 
bandwidths, 

(T 1 -v 1 )/a 1 = O p (l). 

Therefore, the sequence of critical values c a (depending on n) is bounded in 
probability. Similarly, under H\ n , with the selected bandwidths, 

(65) (Ti -/xi - di n )/a ln = O p (V). 

Note that 

P{(Ti - H\)/a\ > c a \H ln } = P{(Ti -m- d ln )/a ln > (c a ai - di n ) / a ln \H ln } 

< P{(Ti -fii- d ln )/a ln > c a a 1 /a ln - r]\H ln }. 
It follows from Theorem 3 and Slutsky's theorem that 

lim sup lim sup -P{ (Ti — H\)/(T\ > c a \Hi n } < a. 

d— s-0 n— >co 

(ii) For any given M > 0, by taking d sufficiently large, there exists an N, 
when n > N, d\ n jo\ n > M. Therefore, 

P{(Ti - m)/ai > c a \H ln } > P{{Ti - m - d\ n )/o ln > c a ai/ai n - M\H ln }. 
By (65), we have 

liminf liminf P{(T\ — fi\)/ai > c a \Hi n } = 1. 1-1 

d->oo n— >oo L - 1 

Proof of Theorems 5 and 6. We put the proofs in the supplemental 
materials [2]. □ 
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SUPPLEMENTARY MATERIAL 

Supplement: Additional technical details (DOI: 10.1214/09-AOS763SUPP; 
.pdf). We provide detailed proofs for Lemmas 1-7 and Theorems 5-6. Mod- 
ern nonparametric smoothing techniques and theory of U -statistics are used. 

REFERENCES 

[1] A'it-Sahalia, Y. (1996). Testing continuous-time models of the spot interest rate. 

Review of Financial Studies 9 385-426. 
[2] A'IT-SAHALIA, Y., Fan, J. and Jiang, J. (2010). Supplement to "Non- 
parametric tests of the Markov hypothesis in continuous-time models." 

DOI: 10.1214/09-AOS763SUPP. 
[3] Ait-Sahalia, Y., Fan, J. and Peng, H. (2009). Nonparametric transition-based 

tests for jump-diffusions. J. Amer. Statist. Assoc. 104 1102-1116. 
[4] Azzalini, A., Bowman, A. N. andHARDLE, W. (1989). On the use of nonparametric 

regression for model checking. Biometrika 76 1-11. MR0991417 
[5] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviation 

of density function estimates. Ann. Statist. 1 1071-1095. MR0348906 
[6] Caverhill, A. (1994). When is the short rate Markovian? Math. Finance 4 305-312. 

MR1299241 

[7] Chen, S. X. and Gao, J. (2004). On the use of the kernel method for specification 

tests of diffusion models. Technical report, Iowa State Univ. 
[8] Chen, S. X., Gao, J. and Tang, C. (2008). A test for model specification of diffusion 

processes. Ann. Statist. 36 167-198. MR2387968 
[9] Cox, J. C, Ingersoll, J. E. and Ross, S. A. (1985). A theory of the term structure 

of interest rates. Econometrica 53 385-408. MR0785475 
[10] Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman's 

truncation. J. Amer. Statist. Assoc. 91 674-688. MR1395735 
[11] Fan, J. and Jiang, J. (2005). Generalized likelihood ratio tests for additive models. 

J. Amer. Statist. Assoc. 100 890-907. MR2201017 
[12] Fan, J. and Jiang, J. (2007). Nonparametric inference with generalized likelihood 

ratio tests (with discussion). Test 16 409-478. MR2365172 
[13] Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric 

Methods. Springer, New York. MR1964455 
[14] Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and 

sensitivity measures in nonlinear dynamical systems. Biometrika 83 189-206. 

MR1399164 

[15] Fan, J. and Yim, T.-H. (2004). A data-driven method for estimating conditional 
densities. Biometrika 91 819-834. MR2126035 

[16] Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and 
Wilks phenomenon. Ann. Statist. 29 153-193. MR1833962 

[17] Fouque, J. -P., Papanicolaou, G. and Sircar, K. R. (2000). Derivatives in Fi- 
nancial Markets with Stochastic Volatility. Cambridge Univ. Press, London. 
MR1768877 



36 



Y. AIT-SAHALIA, J. FAN AND J. JIANG 



[18] Gao, J. and CASAS, I. (2008). Specification testing in discretized diffusion models: 
Theory and practice. J. Econometrics 147 131-140. MR2472987 

[19] Gao, J. and King, M. (2004). Model specification testing in nonparametric and 
semiparametric time series econometrics. Technical report, Univ. Western Aus- 
tralia. 

[20] Hall, P., Racine, J. and Li, Q. (2004). Cross-validation and the estimation of condi- 
tional probability densities. J. Amer. Statist. Assoc. 99 1015-1026. MR2109491 

[21] Hardle, W. and Mammen, E. (1993). Comparing nonparametric versus parametric 
regression fits. Ann. Statist. 21 1926-1947. MR1245774 

[22] Heath, D., Jarrow, R. and Morton, A. (1992). Bond pricing and the term struc- 
ture of interest rates: A new methodology for contingent claims evaluation. 
Econometrica 60 77-105. 

[23] Heston, S. (1993). A closed-form solution for options with stochastic volatility with 
applications to bonds and currency options. Review of Financial Studies 6 327- 
343. 

[24] Hong, Y. and Li, H. (2005). Nonparametric specification testing for continuous-time 
models with applications to term structure of interest rates. Review of Financial 
Studies 18 37-84. 

[25] Hyndman, R. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for 

conditional density functions. J. Nonparametr. Statist. 14 259-278. MR1905751 
[26] Ingster, Y. (1993). Asymptotically minimax hypothesis testing for nonparametric 

alternatives I— III. Math. Methods Statist. 2 85-114; 3 171-189; 4 249-268. 
[27] Lepski, O. and Spokoiny, V. (1999). Minimax nonparametric hypothesis testing: 

The case of an inhomogeneous alternative. Bernoulli 5 333-358. MR1681702 
[28] Revuz, D. and Yor, M. (1994). Continuous Martingales and Brownian Motion, 2nd 

ed. Springer, Berlin. MR1303781 
[29] Spokoiny, V. G. (1996). Adaptive hypothesis testing using wavelets. Ann. Statist. 

24 2477-2498. MR1425962 
[30] Vasicek, O. (1977). An equilibrium characterization of the term structure. Journal 

of Financial Economics 5 177-188. 

Y. Ait-Sahalia J. Fan 

Department of Economics Department of ORFE 

Princeton University Princeton University 

Princeton, New Jersey 08544 Princeton, New Jersey 08544 

USA USA 

AND E-MAIL: jqfan@princcton.cdu 

NBER 

1050 Massachusetts Ave. 
Cambridge, Massachusetts 02138 
USA 

E-MAIL: yacine@princeton.edu 

J. Jiang 

Department of Mathematics 

and Statistics 
University of North Carolina 

at Charlotte 
Charlotte, North Carolina 28223 
USA 

E-MAIL: jjiangl@uncc.edu 



